Travel time optimization on multi-AGV routing by reverse annealing

Quantum annealing has been actively researched since D-Wave Systems produced the first commercial machine in 2011. Controlling a large fleet of automated guided vehicles is one of the real-world applications utilizing quantum annealing. In this study, we propose a formulation to control the traveling routes to minimize the travel time. We validate our formulation through simulation in a virtual plant and authenticate the effectiveness for faster distribution compared to a greedy algorithm that does not consider the overall detour distance. Furthermore, we utilize reverse annealing to maximize the advantage of the D-Wave’s quantum annealer. Starting from relatively good solutions obtained by a fast greedy algorithm, reverse annealing searches for better solutions around them. Our reverse annealing method improves the performance compared to standard quantum annealing alone and performs up to 10 times faster than a commercial classical solver, Gurobi. This study extends a use of optimization with general problem solvers in the application of multi-AGV systems and reveals the potential of reverse annealing as an optimizer.

are not always optimal, although it rapidly outputs relatively accurate solutions. As the environmental effect cannot be avoided, the quantum annealer is sometimes regarded as a simulator for the quantum many-body dynamics 19,20 . Several practical applications of quantum annealer have been presented across various fields, such as finance [21][22][23] , traffic 24,25 , logistics 26,27 , manufacturing 9,28 , and marketing 29 , as well as in decoding problems 30,31 . Its potential for solving the optimization problem with inequality constraints has been enhanced 32 , especially in the case that is hard to formulate directly 33 . The comparative study of quantum annealer has also been performed for benchmark tests to solve the optimization problems 34 . The quantum effect on the case with multiple optimal solutions has also been discussed 35,36 . Further, applications of quantum annealing for machine learning for solving optimization problems have been reported [37][38][39][40][41][42] .
The above-mentioned studies have presented a routing algorithm for multiple AGVs to plan their traveling routes for given pairs of source and destination by utilizing a quantum annealer 9 . The routing aims to improve the working rate of AGVs. This objective is translated into the optimization problem to maximize the total travel distance of AGVs with constraints for removing latent collisions. However, as challenges, AGVs move with unnecessary detours, which reduces delivery efficiency. In addition, depending on the system, AGVs' repeated back and forth movement or along a cycle sometimes lacks efficiency. Therefore, we aim to develop an algorithm to control AGVs more efficiently. In this study, we formulate a QUBO problem to output deconflicted routes closest to the destinations of AGVs by modifying the objective function discussed in the previous study 9 . For a given task for each AGV, we minimize the total detour distance from the shortest paths. To express the detouring distance, we introduce an estimated distance of AGV, which is the travel distance remaining from the end of the path to the destination. By reducing it for all AGVs, we compel them to move ahead to their destination at each iteration. Consequently, we expect this method to accelerate cargo-carrying capabilities by shortening the detour of AGVs.
To optimize the QUBO problem, we employ D-Wave's quantum annealer for solving the QUBO problem. The latest version, that is, the D-Wave Advantage System, has two features-forward and reverse methods for quantum annealing [43][44][45][46] . To reveal the effectiveness of applying D-Wave's machine to our optimization problem, we explore both forward and reverse annealing features, which have not been hitherto discussed in AGVs' routing applications. In short, forward annealing performs a global search, starting in a superposition of all possible states with equal weights. By contrast, reverse annealing searches locally around a given classical state. Reverse annealing refines a solution around the given classical state unlike forward annealing, which searches for all possible states equally. For some types of problems, the reverse annealing approach improves the performance by utilizing initial solutions obtained by forward annealing or classical algorithms 23,47 . In this study, we present a reverse annealing method combined with a fast greedy search for solving the QUBO problem. The problem is also solved via a linear programming solver called Gurobi 48 . To evaluate the performance, we measure the time-to-solution benchmark for each solver.
The remainder of this paper is organized as follows: In the next section, we first review the previous method for the control of AGVs with less congestion. Next, we formulate the control of AGVs with fewer detours as the QUBO problem by modifying the objective function. Then, we present the optimization method that utilizes reverse annealing. To obtain the initial solution, we construct the greedy algorithm for the preprocessing of the reverse annealing method. In the following section, we evaluate our method through a simulation in a virtual plant. To validate the effectiveness of D-Wave's machine, we compare the results of the obtained solutions by each solver. In the final section, we summarize our study and discuss how future studies can improve the performance of solutions to this problem.

Method
In this section, we introduce our method for controlling AGVs using a quantum annealer. First, we review the study by Ohzeki et al. 9 . We briefly explain the routing algorithm and definition of the optimization problem and specify its issues. Then, we redefine the optimization problem to minimize the total journey time of AGVs. The optimization problem is relaxed to a QUBO form, which is heuristically solved by a quantum annealer. Finally, we describe how to run reverse annealing for the QUBO problem.
Assume that loads assigned to AGVs and their start and target positions are given. In the routing algorithm, we consider finding a route for each AGV to carry its load to its destination node from the source node in the network. We call a pair of source and destination nodes a task. A static routing algorithm determines a route for each task in advance, whereas a dynamic algorithm decides the route based on real-time information. A dynamic algorithm is advantageous as the routes are flexibly adjusted to reduce congestion and travel time.
For using a quantum annealer for AGVs' routing, a dynamic algorithm with QUBO optimization has been proposed 9 . The algorithm focuses on improving the working rate of AGVs. For dynamically updated tasks, the algorithm provides a good route for each AGV during a fixed period T. First, for each AGV, we generate multiple dissimilar candidate routes from its current node v 0 to destination v t and shorten them to enable reaching within T s. Each candidate route is expressed as a list of nodes such that it connects x to t on the network. Second, we divide each candidate route into lists of segments to implement AGVs' stopping on the route. For example, if one candidate route consists of a path (v 0 , v 1 , v t ) , then we split it into lists (v 0 , v 1 , v t ) , (v 0 , v 1 ) , and (v 0 ) , which represent moving the AGV two segments ahead, one segment ahead, and stopping, respectively. Note that in this algorithm, AGVs cannot go ahead during T second once they stop. Third, for all AGVs, we find the routes that maximize the working rate from the candidate routes. Finally, we move AGVs along the selected routes in T second and iterate the above procedure.
The procedure of choosing the best route is regarded as an optimization problem to maximize AGVs' working rate. The optimization problem comprises an objective function to be maximized or minimized with constraints. To represent the working rate, the objective function is defined as the total distance that AGVs move in the www.nature.com/scientificreports/ network. By maximizing the objective function, the total distance of routes is maximized and the solution leads to the highest working rate in the AGVs' system. However, we have two issues with the objective function of the algorithm. One is inducing a longer detour distance, while the desired AGV's route should be the shortest possible to ensure fast delivery. The other is the possible suppressing of the system because of AGV's continuous back and forth movement in certain areas. To overcome these challenges, we present a new definition of the optimization problem to minimize travel time.
The travel time is proportional to the route distance and can be directly used for the objective function. However, recalling that we use the divided routes and not all of them reach the destination nodes, the length of the shortest route equals zero and that causes all AGVs to stop. In this study, we introduce the remaining distance instead of the route distance. For a given i-th AGV and j-th route, the remaining distance d * i,j is the length of the shortest path from the end node of the route to its destination. By choosing the shortest route in terms of the remaining distance, AGVs move to nodes closest to their destination nodes for each iteration in the algorithm. Therefore, the algorithm greedily minimizes AGVs' travel time to complete their tasks. Let q ij be a binary variable that takes a value 1 if the i-th AGV moves on the j-route and 0 otherwise. Then, the objective function is defined as where N is the number of AGVs and M i is the number of candidate routes for the i-th AGV.
To safely control AGVs, collisions on roads or at intersections must be avoided. Deconfliction can be implemented by slowing down or stopping the AGVs after the traveling routes are arranged; however, we perform it simultaneously during the optimization. We utilize a similar technique discussed in the previous study to define constraints for collision avoidance 9 . We allow at most one AGV to occupy each line segment and intersection. We define F i,j,t,e such that F i,j,t,e = 1 if the segment e is occupied with the j-th route of the i-th AGV at time t and F i,j,t,e = 0 otherwise. Note that if an AGV is on a segment uv, we consider any segment coming to v as occupied. For any edge e and time t, N i=1 M i j=1 F i,j,t,e q i,j has to be at most one. Now we define a mixed integer linear programming (MILP) problem as where E represents a set of edges in the network. The first constraint ensures that each AGV picks a single route and the second one ensures deconflicted routes. In general MILP problems, finding the exact solution quickly is difficult, although classical methods such as branch-and-bound algorithm are utilized to traverse the solution space efficiently 49 .
Next, we transform the MILP problem into a QUBO problem to utilize quantum annealing. By using the penalty method for the equality and inequality constraints in problem (2), the cost function to be minimized in the QUBO problem can be expressed as: where a > 0 and b > 0 denote penalty parameters that are tuned to ensure that the constraints are satisfied with the minimum energy. The second and third terms represent the equality and inequality constraints in problem (2), respectively. Thus, we have the following quadratic form with a matrix Q by transforming the function (3).
General QUBO and MILP problems are known to be hard to strictly optimize. As QUBO problems do not have any constraints, the size of their feasible solution space tends to be fairly larger than that of MILP problems for the same problem size. Thus, heuristic algorithms are widely applied for quickly obtaining an adequate solution. In this study, we employ D-Wave's quantum annealer to solve the QUBO problem heuristically.
QUBO problems are equivalently converted to minimizing the energy of the Ising model, for which the Hamiltonian is expressed as: where σ i ∈ {−1, 1} denotes a spin variable.
In quantum annealing, the quantum fluctuation is utilized to effectively find solutions of Ising models. The D-Wave machine operates the quantum system with superconducting qubits in a transverse field and its Hamiltonian is expressed as: Depending on the annealing schedule, quantum annealing is classified into forward and reverse annealing. Forward annealing starts with s = 0 and gradually increases to s = 1 in the predetermined annealing time. By gradually decreasing the magnitude of the transverse field, the qubits are dephased into a nontrivial classical state of the Ising system. In an ideal procedure of forward annealing, the optimal state of the Ising model is known to be attained. However, D-Wave's quantum annealer couples to an open system that is affected by its environment, which leads to a limited optimization performance.
Unlike forward annealing, which is initialized with the full superposition, reverse annealing starts with a classical state and searches for a better solution in its vicinity. In reverse annealing, the system starts with s = 1 and gradually decreases s to s = 1 − r , where 0 < r < 1 is the reversal distance, which determines the strength of transverse field. Pausing the schedule at s = 1 − r for a certain period exploits the thermal relaxation in a low-temperature bath and enhances the optimization performance. After the pausing, the system is annealed forward to s = 1 and the qubits are dephased into a classical state.
In this study, we employ reverse annealing mainly for two reasons. One is to improve the performance of optimizations by combining a fast classical heuristic algorithm. The other is to ensure the feasibility of solutions initially obtained by some classical algorithms. Both forward and reverse annealing have no assurance of the output precision and infeasible solutions could be obtained, which induce fatal errors in manufacturing systems. Thus, an alternative technique to find feasible solutions is necessary for practical use, which can be utilized if the annealing method fails to find feasible solutions. Furthermore, reverse annealing provides an opportunity to refine such feasible solutions and exceed the performance of forward annealing alone.
We introduce a greedy algorithm to derive feasible solutions and utilize them as initial states for reverse annealing. The simplified flow of the algorithm is as follows.
1. Allocate the shortest route to each vehicle. 2. For any one of the two vehicles that collide, assign the next route. 3. Iterate 2 until all collisions are removed.
In the algorithm, the selections of any two conflicting routes rely on the calculation of an impact value, which is the number of conflicting AGVs for their next candidate routes. One route with a smaller impact value is dismissed and the other route with a larger impact value is selected. By exploring the candidate routes in the ascending orders of the remaining distance, the algorithm greedily searches the solutions. As the candidate routes are separated to stop on the way, the algorithm always finds conflict-free routes. This algorithm runs much faster than solving the MILP or QUBO problems. Specifically, the number of iterations in the algorithm is O(nm), where n and m are the number of AGVs and candidate routes, respectively.
The greedy algorithm does not consider any detour distance in the selection of routes and aims to quickly find the deconflicted routes, which results in exploiting the high working rate. This feature enables us to imitate conditions similar to those in the previous study. The working rate of AGVs is also preferred to be high in the AGV systems to avoid traffic congestion. In the next section, we compare the algorithms in terms of the working rate and carrying efficiency.
To exploit the performance of reverse annealing, we maintain the annealing schedule properly. The most dominant parameter is reversal distance r. If it is too small, the solution stays in the initial state and if it is too large, the probability of achieving a global minimum by performing the global search equivalently as forward annealing is low. To find the reversal distance with a high possibility to output optimal solutions, for 10 randomly chosen QUBO matrices during the algorithm for controlling 20 AGVs, we analyzed the solutions for different reversal distances. The result is illustrated in Fig. 1. Obtaining the same initial state had a high probability with a small reversal distance, which gradually decreases as reversal distance increases. When reversal distance is too large, reverse annealing fails to find optimal states in most cases. Overall, the count of obtaining optimal solutions peaks around r = 0.45 . Thus, we find the reversal distance 0.45 as the probability of obtaining the ground states peaked around the value and the probability of obtaining the same states is small. Finally, we have a calibrated reverse annealing schedule shown in Fig. 2. The default annealing time of D-Wave's quantum annealer is 20 µs and we use this setting in forward annealing. We set the annealing time in reverse annealing to 13.3 µs with a pause of 10 µs , which is shorter than that in forward annealing.

Results
In this section, we report the results of the dynamic algorithm solving the QUBO problem at each time. To verify the effectiveness of the algorithm, we simulate the AGV system in a simple virtual plant, as illustrated in Fig. 3. In the plant, 20 AGVs are active and they are provided a list of tasks that have to be completed at the earliest. In fixed simulation time, the number of completed tasks corresponds to the efficiency of carrying, which is also evaluated by the total travel time to finish the fixed number of tasks. The speed of each AGV is set to 0.5 m/s. We iterate the algorithm 500 times and observe the number of tasks completed and the working rate. The time interval T per iteration is set by 2 s. In the simulation, different solvers are utilized to obtain the solution to each QUBO problem, specifically, greedy algorithm, Gurobi, forward annealing, and reverse annealing. As mentioned above, the greedy algorithm attempts to avoid congestion by reducing the number of AGVs stopping.  Table 1. At each optimization, Gurobi always outputs the optimal solution and leads to the highest performance in completed tasks. Although the working rate of the greedy algorithm was higher than that of Gurobi, the completed tasks of the greedy algorithm were less than that of Gurobi. This is because AGVs move with unnecessary detours in the greedy algorithm, which indicates that travel time is reduced by solving our optimization problem. In our case, forward annealing performed worst both in the completed tasks and working rate because of failing to find low-energy solutions. By contrast, reverse annealing outperformed forward annealing and resulted in almost the same scores as those of Gurobi. To conclude, the optimization of the QUBO problem 3 is effective in realizing a time-efficient multi-AGV system as well as solving the MILP problem. In addition, Gurobi and reverse annealing are comparable as solvers to control 20 AGVs.
To compare the performance of solvers, we benchmark time-to-solutions (TTS), defined as: where p is the probability of obtaining the optimal solution at least once with a fixed number of trials, p s is the probability of obtaining the optimal solution with a single trial, and t c is the computational time for a single trial. For example, TTS(0.99) shows the estimated amount of time to obtain the optimal solution with a 99% Figure 1. Calibration of reversal distance. We performed reverse annealing on 10 random nontrivial QUBO problems that appeared while running the algorithm with 20 AGVs and analyzed the energies of each of the 1000 samples. For a given reversal distance, the height of the red, green, and gray areas represents the mean probability of attaining the same state, the ground state, and the other state, respectively. The problem size is the summation of the number of candidate routes for each AGV. Note that the problem size differs in the algorithm because of separating routes to let AGVs wait. We calculate the TTS(0.99) for 10 QUBO problems appearing in the simulation. The number of samples in reverse annealing is set to 10, 000. The result is depicted in Fig. 4. We plot the wall-clock time for Gurobi to obtain the optimal solution instead of TTS. Reverse annealing outperforms Gurobi with almost 10 times shorter time when the problem size is small. As the problem size increased, reverse annealing needed a longer time to obtain the optimal solution, and Gurobi outperformed reverse annealing. Reverse annealing failed to find the optimal solution when the problem size was over 100 and benchmarking TTS was impossible. We assume this is because initial solutions obtained by the greedy algorithm become worse for larger problems.
To investigate the relationship between initial states and outputs in reverse annealing, we evaluate their closeness to optimal in terms of energy, which is measured by residual energy defined as: where E is the mean energy of samples and E min is the energy of the optimal solution. The residual energy corresponds to the closeness between the mean and minimum energy. We obtain 10,000 samples by forward and reverse annealing. The result is depicted in Fig. 5. The greedy algorithm always leads to a unique solution, and we plot the fraction of its energy over the minimum one. The greedy algorithm has extremely high residual energy as problem size increases. When the problem size is less than 50, the residual energy of forward annealing is higher than that of the greedy algorithm, which indicates that the greedy algorithm performs better than forward annealing. By contrast, reverse annealing resulted in lower residual energy even for small problems. In reverse annealing, the residual energy tends to be higher but still less than forward annealing as the problem size increases up to 215. For 250 variables, reverse annealing has higher residual energy than forward annealing and may turn worse for problem sizes larger than 250. Figure 3. Virtual plant we used in this study. The network shows the roads on which AGVs can move. Each circle represents a node, which is an intersection or a mid-point. Each arrow represents the directed connection between two nodes. The length of each arrow is one meter. The square nodes indicate the pick-up or drop-off points that can be the source and destination of AGVs. In the plant, 20 AGVs are active on their task.

Discussion
In this study, we formulated a QUBO problem to control multiple AGVs' routes with minimized detours for a given period. Using a simulation, we demonstrated that our algorithm yields time-efficient routing. However, controlling AGVs as simulated is not always possible because of unpredictable events such as human errors and communication delays. To test our methods under such practical conditions and realize robust routing algorithms is interesting.
As an optimization method, we utilized reverse annealing with initial states obtained by a fast greedy algorithm. We confirmed that our reverse annealing method has the potential to exceed forward annealing alone or even a classical commercial solver, Gurobi. Practically, reverse annealing performed up to 10 times faster than Gurobi to obtain optimal solutions for small problem sizes. We believe this superiority of reverse annealing over Gurobi is caused by the specialty in short-time sampling on D-Wave's quantum annealer. In contrast, forward annealing suffered in finding optimal solutions even with small problems. We reckon this unexpected behavior of D-Wave Advantage 1.1 is due to its unique characteristic. According to the technical report by D-Wave Systems, such behavior is not announced and the system seems to perform as expected 50 . Thus, D-Wave Advantage 1.1 may have a weakness for the types of problems that we seek to solve.
By contrast, reverse annealing incorporates the benefits of both the greedy algorithm and quantum annealing and seems to have stable performance. One possible approach to improve our reverse annealing methods is switching the initial algorithm from the greedy algorithm to forward annealing depending on their performance. The residual energy of the greedy algorithm is considerably higher than the minimum one, although reverse annealing corrects the initial state to a certain extent. Choosing the right initial solver for large problems will improve the performance of reverse annealing. One choice of the approximate solvers can be a mean-field approximation and its variants 51 . The performance of reverse annealing with the mean-field approximation has  www.nature.com/scientificreports/ been investigated 31 . Employing approximation solvers with a theoretical guarantee for setting initial conditions on reverse annealing can enable us to investigate the performance and design a theoretical method for assessing its limitation. As demonstrated in our study, although it is just one of the optimization problems, the performance of reverse annealing with the greedy algorithm is comparable with that of the commercial best optimizer. By assessing various combinations of reverse annealing and approximate solvers, we may find a new classical-quantum hybrid scheme to solve large-scale optimization problems. In this study, simulated annealing is beyond our scope because several studies pointed out the advantageous features of theoretical quantum annealing over simulated annealing. However, the current D-Wave's physical machine does not yet follow the ideal adiabatic condition, and simulated annealing possibly outperforms standard forward annealing. Hence, it is an open question that the current D-Wave's quantum annealer still performs better than simulated annealing, if both annealing time are set identically.

Data availability
The datasets used during the current study are available from the corresponding author on reasonable request.